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Abstract 

It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended 
quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical 
peculiarities of the K3ST model, and providing an analogous augmentation of the general Markov 
model has thus far been elusive. In this paper we rectify this shortcoming by showing how to 
extend the general Markov model on trees to to include arbitrary splits; and even further to 
more general network models. This is achieved by exploring the algebra of the generators of the 
continuous-time Markov chain together with the "splitting" operator that generates the branching 
process on phylogenetic trees. For simplicity we proceed by discussing the two state case and note 
that our results are easily extended to more states with little complication. Intriguingly, upon 
restriction of the two state general Markov model to the parameter space of the binary symmetric 
model, our extension is indistinguishable from the previous approach only on trees; as soon as any 
incompatible splits are introduced the two approaches give rise to differing probability distributions 
with disparate structure. Through exploration of a simple example, we give a tentative argument 
that our approach to extending to more general networks has desirable properties that the previous 
approaches do not share. In particular, our construction allows for the possibility of convergent 
evolution of previously divergent lineages; a property that is of significant interest for biological 
applications. 
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1 Introduction 



Phylogenetic methods seek to infer the prior evolutionary relationships of extant taxa. Classically, 
this was achieved by comparing morphological features, but modern methods focus on molecular 
data such as DNA. Harking back to sketches in Darwin's early notebooks, it has also been as- 
sumed that evolutionary history resembles a tree structure. However, it is now well known that 
evolutionary processes such as hybridisation, deep coalescence (incomplete lineage-sorting), hori- 
zontal gene transfer and recombination cannot be accurately modelled as a tree. Even when the 
underlying historical signal fits a tree, there may be conflicting non-historical signals caused by 
sampling error, long-branch attraction, nucleotide composition bias, or changes in the substitu- 
tion rate at individual sites across the tree, as well as alignment or misreading errors. Incorrect 
or over-parameterized models of sequence mutations can lead to high statistical support for splits 
that are incompatible with a single tree. 

It is clear that imposing a strictly treelike evolutionary history may be inappropriate in the 
situations described above, hence methods that can assist in identifying and understanding conflict 
in phylogenetic data are essential. One class of methods that have proved useful in this respect 
are weighted split-systems and their corresponding visualization as networks. Split networks ini- 
tially arose as means o f visualizing the split decomposition of a distance metric as defined by 
Bandelt fc DressI (|l992l ). In that work, the authors gave a decomposition that provides a way 



of assessing whether the structure of a distance matrix is treelike or if it contains other con- 
flicting signals. However, the idea of a weighted split-system is v ery general and has arisen in 
many phylogenetic contexts. These include (1) median networks ( Bandelt . Il994l) . where splits 
and their weights are derived from a bina ry coding of a sequence alignment; (2) Hadamard (or 
spectral) analysis ( Hendv fc Pennv . 1989[ ). which defines an invertible relationship between site 
patte rns and a split spectrum u nder certain simple models (K3ST and subclasses); (3) Neighbor- 
Net (jBrvant fc Moulto"n . 2004 ). a distance-based method which applies a greedy agglomerative 
algorithm to find a circular ordering of taxa and then a least-squar es approach to find weight s 



for the corresponding set of circular splits; (4) Consensus networks (jHolland fc Moultonl . 120041 ) 



which take a set of trees and define a weighted split-system based on th e number of trees th at 
display a particular edge, possibly incorpora ting edge weight i nforin ation ( Holland et ai . 2006[ ). 



With the exception of spectral analysis Hendv fc Pennv ( 19891 ) these methods are all com- 
binatorial and/or distance-based; there is currently no way to infer a weighted split-system in 
the likelihood setting using general Markov models of sequence evolution. That said, there has 
been some previous work on calculating likelih o od sco res for particular phylogenetic networks un- 
der special models. Ivon Haeseler fc Churchilll ()1993( ) developed a framework for computing the 
likeli hood of a split-system for bin ary sequences under a Cavender-Farris model with invariable 
sites. Strimmer fc Moulton developed a Bayesian approach that calculated the likelihood 

of a giv en directed acyc lic graph (DAG) for more complex models of sequence evolution. More 
recentlv lJin et al. ( 2006[ ) defined a likelihood score for phylogenetic networks as a weighted mix- 
ture of tree likelihoods. All of these methods begin with a given phylogenetic network and then 
attempt to calculate its likelihood under some model. This is very different from the Hadamard 
based approach - and the new approach we explore here - which begin with the data and a model 
and infer a weighted split-system. 

Given their importance, there is a distinct lack of an extension of the standard Markov mod- 
els on trees to arbitrary split systems. In recent work, Brvant ( 2005a . 20091 ) has re-examined 
the nature of the Kimura 3ST and binary-symmetric model as, under a simple extension, these 
models permit the inclusion of arbitrary splits over and above those that come from a single tree. 
However, these so called "group-based" models of sequence evolution are not motivated by bi- 
ological considerations and hence their validity in applied studies must be scrutinized carefully. 
The primary motivation for these models is mathematical elegance and simplicity, so that it is 
not necessarily the case that the underlying assumptions have biological relevance. Specifically, 
all group-based models have doubly-stochastic rate matrices and thus uniform stationary distri- 
butions. This is clearly ina ppropriate given tha. t varying GC content is known to be of crucial 



importance in phylogenetics (jJermiin et a/.l . l2004l) . It appears that to date it has not been possible 
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Figure 1: A rooted tree. 



to employ general model-based methods to infer split networks from phylogenetic data sets. 

In this article we show how Markov models of phylogenetic evolution on trees (thought of as 
compatible split systems) can be generalized to the case of arbitrary split systems. In the binary 
case of two character states, we achieve this by studying the algebra of the generators of the 
continuous time Markov chain together with the "splitting" operator that generates the branching 
process on phylogenetic trees. The resulting presentation of the general Markov rate matrix model 
on a tree is such that it can be generalized in a natural way to include arbitrary splits; including 
those that are incompatible with any tree. This results in a very general model that contains the 
standard tree model as a special case, but has the potential to associate an individual weight and 
rate matrix to any additional splits that we wish to include. Additionally, we show by example that 
our approach gives rise to the possibility of Markov models on much more general networks, with 
phylogenetic evolution proceeding in a series of "epochs" consisting of divergence or convergence of 
arbitrary groups of taxa (ie. lineages). As part of the discussion, we note that our results are fully 
generalizable to any number of character states with complication of detail only. Intriguingly, we 
will also show that under a restriction of the parameter space to the bi nary-symmetri c case that 
this model is not consistent with the Hadamard based approach given bv lBrvantl(|2009f l. We close 
with a simple example that shows our construction has the ability to model convergent evolution 
of lineages; a property that is simply not available to the Hadamard based approach. 



2 Preliminaries 

In this article, a tree with vertex set X is an acyclic graph with vertices chosen from X such that 
all vertices have valence of exactly 3 or 1. A rooted tree is an acyclic graph as above but with a 
single vertex p (the root) having valence 2. Thus, a rooted tree is a collection of edges e € (^), 
and can be made to be a directed graph by considering e = (w, v) as an ordered pair of adjacent 
vertices, where u lies on the path from v to p. Vertices of valence 1 are referred to as leaves and 
we label the leaves from elements of the set [n] :— {1,2,..., n}. For a rooted tree, we label each 
non-leaf vertex v by the subset of leaves such that the path from each of these leaves to p contains 
V. Additionally, we label each edge e = {u,v) by the subset that labels the vertex v. In this way, 
the edges are labelled by subsets of [n], with pendant edges labelled by singletons. See Figure [T] 
for a graphical representation of a rooted tree. 

Consider the vector space V = with the ordered basif0 

{io)...:^(;). 

With respect to this basis, we can define "Markov generators" as the zero column-sum matrices 

^"=(10)' ^ ( -1 ) ■ 

^We use "Dirac notation", where a vector is represented by a "ket" |}, as this notation is particularly elegant 
when it comes to more general phylogenetic character patterns. 
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In this way, the general rate matrix for a continuous-time Markov chain on two states can be 
expressed as the hnear combinatior0 



Q = aha + PLfj = 



—a 
a 



The associated transition matrix [M{t)\^^^ representing the probabihty of a transistion j 
time is then given by the exponential map 

M (t) = exp [Qt] 



(1) 



i at 



n=0 



LnL 



-L 



/3' 



By noting that the generators satisfy the relations 
L\ — LpLa = —L 
it is easy to show that 

Thus 



(2) 



Mit) = exp m = J2 



= 1 - 



1 



-1^ (e-("+«* - l) (aL^+pLp) 



(3) 



As M{t) is invariant under the reparameterizatiord 



a, 



we see that we can "scale out" t by choosing X = t^^. As, in a practical context, a, (3 and even t are 
unknown parameters that must be inferred from observed data using some statistical estimation 
procedure, we see that we can take 



M(a,/3) 



as completely equiv alent to pi. I f we think of M(a,/3) as a two-dimensional manifold (in the 
sense of a Lie group ( Procesi 12007 )). then we see that the Markov generators are none other than 
the basis vectors of the tangent space at the identity: 



d 



M{a, (3) 



d_ 



M(a,/3) 



a=P=0 



with algebraic closure across the "Lie bracket" [Lq;,L^] := L„L s — LsLn 



jfi e nsurmg 



"closure" of the corresponding Markov model (as is discussed in iJarvis fc Sumner (|2010h ). This 
connection between continu ous time Marko v chains and Lie groups is an important one and seems 
to have been first noted bv I JohnsonI (|l985f ). This point of view is needed in order to extend the 
results of the present article to the case of character state spaces of arbitrary size. Having given 
this perspective into the meaning of the Markov generators, we from will nevertheless take the 
more usual representation ([3]) of transition matrices in all that follows below. 



^An amusing aside: Our rate matrices have zero column- rather than roui-sum, as we Uke to conform with 
physicists' notation of right matrix multiplication. This sits well psychologically with the physical picture that a 
linear operator "hits" a vector and it "moves", which, in turn, is in tune with the left-to-right direction that one 
reads printed English. 

^For further discussion of local ti me-rep aramctcrization in phylogenetics see lJarvis" et al\ ||2005|') or, in the context 
of a changing rate of mutation, see iPennvl lf2005i ) 
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In Sumner fc JarvisI ( 2005h it was shown that the Markov models of phylogenetics in standard 



use can be represented in an abstract setting using the tensor product space T^(8)y (g). . where 
dim(V^) = fc is the number of character states and the number of copies of V is equal to the number 
of taxa under consideration. In these models, it is usual to impose conditional independence across 
the branches of the tree and this can be formalized using a linear operator 6 : V ^ V ® V to 
generate speciation events. This is referred to as the "splitting operator" and is defined, using our 
chosen basis, as 

6-\i) = \i) (g) \i) , 

which, expressed at the level of the probability distributions, corresponds exactly to the duplication 
of a sequence of molecular units. That is, if we select state i from the initial sequence with 
probability pi, then immediately after duplication - and assuming the sequences remain aligned 
- the probability of observing the pattern ij at a given site is pidij. By defining the vector 
p := J2iPi I*) £ ^ ^-iid noting that the tensors {\ij) := \i) |j)}o<i j<i form a basis for V ®V, 
the splitting operator achieves this notion of speciation in the abstract setting: 

S ■p = S ■ I ^Pi \i) I = ^Pi^ ■ |i) = I'i-i) = 1*-^^ ' 

\ i / i i i,j 

where P :^ S ■ p £ V V has components piSij and is referred to as a "phylogenetic tensor". 
Subsequently, given two rate matrices Qi and Q2, and two edge weights ri and T2, the phylogenetic 
tensor evolves to 

P' ^e'^i^i (g)e'5^^^ -P, 
which, up to first order terms in the edge weights, is 

P' = [(1 + ainL^ + f3iTiLp + . . .) ® (1+ a2T2La + f32T2Lfl + ...)]■ P 

= [l + aiTiLa 1+ a2T2l 'S>La+ PiTiLp (g) 1 + ^2T2l ® L/3 + ...]• P. 

In this way, the splitting operator can be thought of as the generator of the branching pattern 
of the phylogenetic tree, whi le and Lfi are the g enerators of the Markov proce ss. (For more 
details of this formalism see iBas hford etdl (|2004V bumner & Jarvid (|2005D and ISumner et al 

ee lJarvis 



(|2008l ). and for an even more general setting see lJarvis e t al. (2005)). Presently, we are concerned 



with the algebra resulting from application of these two types of generators. 



3 Some helpful lemmas 



Lp |0) = 0, 
L^|1) = |0)-|1). 



The action of the Markov generators on the basis vectors is 

ijO) = |l)-|0), 
= 0; 

By comparing the relations 



and 



(4) 



5 ■ La 


|0) = 


= lll)- 


- |oo) , 


s 


La € 


5 La 


00) = 


|11)- 


101)- 


La € 


3 La 


11) = 


0, 




La 


(g) 1 


|00) = 


|10)- 


|00), 


La 


«) 1 


lll) = 


0, 




l(g 


3 La 


00) = 


101)- 


|00), 


1(S 


3 La 


11) = 


0; 





with similar for i^, we find that the Markov generators can be "pushed through" past the splitting 
operator: 
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Lemma 3.1. As operators from V to V (i?) V , we have 

S ■ La — {La (8)ia + -^Q<8)l + l(8l La) ■ S, 

6-Li3 = {Lp(g)Lp + Lfs®l + l(E)Lj3)-6. 

In the terminology of group actions (jProcesi , 2007 ) , this lemma tells us the rule for how the two 
operators "intertwine" . It is exactly this relation that we will exploit to show how to generalize 
from a Markov model on a tree to a model on a general split system. 

Given a linear operator X on V, we define a linear operator X*^*^ on V (i) V iS5 . . . (gi V as the 
tensor product 



X« 1 ® 1 



1, 



where X appears in the i*'* slot of the tensor product. Further, for a subset A C [n] :— {1, 2, . 
we define 



4, 



(i) 



For example, if we take n — 5, we have 

X^^^) = (l(X)X(8)l(g)l(g)l)-(l(g)l(g)l(K)l(8)X) = l(8)X®l(g)l(g)X. 



Presently we will show how the interaction of 6 with La naturally produces terms such as La 
(and similar for L/3). 

Lemma 3.2. As linear operators from V to V (E) V (E) V , 

1(S)S-S = S(E)1-S. 

Proof. We have 

1^6-6-\i) = l(i)6 - (g) = \i) (g) ® \i)) = \i) |z) (g) |z) := . 

Similarly, 

5 ® 1 • (5 • |i) = (5 (g) 1 • (g) = (I?) g) g) |z) = \i) g) |z) g) \i) = . 



(A) 



□ 



Using this lemma, we can recursively define 

(5*+i : = (5g)lg)lg)...l-(5* = lg)(5g)lg)...g)l-(?' = ... = lg)l 
with := S. The action of the o perator ( 5"~^ t aking V to V (S^ V 



1(E) S-S' 



the "n-taxon proce ss" as defined in 
formalism given in lBashford et al\ (|2004l ) 



. . ®V generates exactly 
BrvantI (|2009l ). which, in turn, is completely equivalent to the 



If we note that 5 • 1 = 1 g) 1 • 5 and consider Lemma [5TT1 we see that, for a; = a or /3, we have 

6"^ ■ L^ : = S ^ 1 ■ d ■ Lx 

= S (S) 1 ■ {La; (g) La; + 1 (E) Lx + Lx (g) 1) ■ S 

= [{La ® La ® La + La ® I ® La + I ® La; ® La;) + 

l®l®La; + {La®La®l+La®l®l + l®La®l)]-5'^ 

E 4^^ I • 



Generalizing this result we have: 
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Lemma 3.3. 

6-'-L^=( LiA-S--\ (5) 

Proof. The proof is by induction. We have shown that the result is true for n — 3. Assuming that 
it is true for some n > 3, we have 

5" . = (,5 ® 1 . . . 1) • I ^i^' I • 

\AC[n],A^(ll / 

= ((5 ® 1 ® . . . ® 1) • I J2 1 ® 4^' + J2 ® ■ 

\AC[n~l],A^iD AC[n-l] J 

^AC[n~l],A^il 



AC[n-ll 



yA<Z[n+l],A^(tl 



□ 



(6) 



For the two state model, recall that for n > 1 we have 
If we define 

then Lemma 13.31 implies that wc have the intertwining 

We also note that we have the recursion 

4"! =L^® 4""^' + 1 ® 4""^' + (g) 1 (g) 1 (g) . . . (g) 1, 
and (after a little effort) it follows by induction on n that 

/ fi[n]A^ _ fiNri[ri] _ _ ( oNA^ _ ri[n] riN _ _oM 

Inspection reveals that, for all n, £L"' and nj^*' satisfy exactly the same algebraic relations as Lo 
and Lp that were given in ([2]). It follows immediately that, for n > 1 we have 

(«£["! + = + («4'1 + /34"1) , 

and 

= (-l)"-'(a + (a4"l + /3£f 1) • 5"-^ 

Putting this together we see that 
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Figure 2: A rooted tree on four leaves. 



Lemma 3.4. 



• exp [aLo, + PLfs] = exp + P^^n 



In order to put all of the above to work, we require one final lemma regarding tensor products 
and the exponential map: 

Lemma 3.5. Given any two linear operators X and Y , we have 
Proof. Consider 

e-^ (g) 1 = (1 + X + + . . .) (g) 1 = 1 ® 1 + X ® 1 + i (X (g) 1) V . . . = e^®\ 
which implies that 

(g = ® 1 ■ 1 ® = e^^^ ■ e"®^ = e^®i+i®^, 
where the last identity follows because X ®\ and 1 g) F commute. □ 

4 Alternative presentation of Markov models on trees 

Consider the tree presented in Figure^] Suppose we are given a root distribution \-k) := tt^ 
a rate matrix Q = aL^ + PLp^ and edge weights ti,T2, T3, T34 and T234. The phylogenetic tensor 
corresponding to this tree can be generated as 

P = (g 6^"'= (g 6*5^^ g) e^''" ^(glgxJ-lgjlg) e^''''^ • 1 g) (5 • 1 g) e'^^^^'' ■5-\-k) . 

If we set P = ^ J, J Pijki \ijkl) and interpret piju as the probability of observing the pattern ijkl 
at the leaves of the tree, we see that this t ensor is equivalent to specifying a joint distribution in 



the normal way (see lSemple fc Steell (|2003f) for example). 



By setting Q = aLa + PLp and applying Lemma 13.41 we find that 

([2] [2] ^ 
^ ^ ^ ^ ^ . • 1 ® l(g(5. 

Now, we note that 

lg)l®(5-lg)(5=[l«)(lg)(5)]-l(g)(5 = lg)(lg)(5-(5) = l®(5^, 
so again applying Lemma 13.41 we find that 

I® 6^- 1® e«^^^^ = 1 ® ei'-^'^'+P^fh^- . 1 ^ 
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We also set 
Thus 

P = e«^^ ® eQ^- ® e'^^^ ® e'?^^ ■ 1 ® 1 ® . 1 (g> ^i'-^'^'+^^fh^^ . | j3^^ ^ 

with I'^'^tt) := (5^ • |7r) = po |0000) + pi Finally, by applying Lemma [3.51 multiple times, we 

find that we can write 

P = exp [tiTZi + T27^2 + T^TZs + t^TZ^] ■ exp [r347^34] • exp [r2347?.234] • |<5^7r) , (7) 

where 

7^l = a (La (g) 1 (g) 1 (g) 1) + /3 (L/j (g) 1 (g) 1 (8) 1) , 
7^2 = a (1 g) La g) 1 ® 1) + /3 (1 (g g) 1 g) 1) , 
Tea = a (1 g) 1 g) La g) 1) + /3 (1 (g 1 g) L^ g) 1) , 
= a (1 g) 1 g) 1 g) La) + /3 (1 g) 1 g) 1 g) L^) , 

7e34 = a (l g) 1 g) £[^2]^ + /3 (l g) 1 g) £[5^1) , 
7e234 = a (1 g) £L'l) + /3 (1 ® . 

Suppose instead of the tree above we considered the quartet given in Figure[31 The phylogenetic 
tensor corresponding to this tree is given by 

P' = g) e^""^ g) e'S^^ g) e'S^'' • (5 g) (5 • e'^^^^ g) e'^^"* • (5 • |7r) . 

By a similar argument to the one just given, it is possible to show that this tensor can be re- 
expressed as 

P' = exp [tiTZi + T2TZ2 + t^TZ^ + TiTZi] ■ exp [ti27?.i2 + 'r347?.34] • |(5^7r) , 

with 

TZi2 = a (£[,21 ® f ® 1) + ^ (^£f 1 (g 1) . 
We can extend our definition © of £L"' to arbitrary subsets by taking 

ACB.Ayiii 

for all A Q [n]. Now label the elements of yl as A = {ai, 02, ... , a|yi|} ^^^d consider a permutation 
tr G ©n such that cr(a.i) = z (obviously such a permutation always exists). If we allow a to act on 
Y^n j-jy permuting tensor factors, it is clear that 

(t(£^) ^^^ll^ld"-!'^!!). 

From this we can conclude that for fixed A the operators |£a,£^| also satisfy the exact same 
algebra as {LajLp}: 

/ (iA\'^ nA(iA (xA / nA\'^ nA riA nA 

for all AC- [n]. Using this result, we can unify the expressions for the rate matrices above by 
defining 

TZa :=a£f +/?£l 
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Figure 3: Alternative tree on four leaves. 



Evidently An B implies that [TZajT^b] = and we see that there are several ways we can 
express our two phylogenetic tensors. For instance the following presentations are all equally valid: 

P = exp [nUi + T2'R.2 + TsRs + TaHa] ■ exp [t347?.34] • exp [t234'/^234] • |<5^7r) 
= exp [t27^2 + TsT^s + T^Tli] ■ exp [t-JIi + T347?.34] • exp [T2SiR-2Si] ■ I ^^71") 

= exp [t27^2 + TsUi + TiTli] ■ exp [r347^34] • exp [t{Ri + T2SiR-2Si] ■ I^^ti") 
= exp [T37I3 + r47^4] • exp [r347?.34] • exp [nT^i + r27?.2 + T2SiR-23i] ■ I^^tt) . 

The content of our main theorem is that there is a canonical choice of presentation of a 
phylogenetic tensor for arbitrary trees. Consider the presentations of the quartet tensors discussed 
above: 

P = exp [nT^i + r27^2 + t^TIz + T^TZi] ■ exp [r347^34] • exp [r234'7^234] • |<5^7r) , 
P' = exp [tiTZi + T2'R2 + T37?.3 + r47?.4] • exp [ti2'R-i2 + T34 7^.34] • |(5^7r) . 

Theorem 4.1. Consider a rooted tree with n leaves T = {^1, ^2, • • • j ^2n+2}, where Ai C [n]. 
Given a root distribution n, any rate parameters a and (3 and weights {taiiTa^, ■ ■ ■ ■,TA2n+2}j 
phylogenetic tensor (or joint distribution) P at the leaves of this split system can be expressed as 

P = exp [Afi] • exp [A'2] • . . . • exp [X^_r] ■ j^"" V) , 

with 

\5^-\) := ^"-1 . Itt) = po |00 . . . 0) + pi 111 . . . 1) , 

and 

Xi = ^ taT^a- 

A,\A\=i 

Proof. The proof is by induction. Clearly a phylogenetic tensor on two leaves can be placed into 
the required form: 

P = cxp [Tl7^l +T27^2] • \Sn) . 

Assume that for A; > 2 any phylogenetic tensor on k leaves P^''^ can be expressed in the required 
form: 



PC^^ = exp [rl7^l + T2T12 + ... + T„7^„] - exp 




• . . . • exp 








A,\A\=2 




A,\A\=k-l 





Without loss of generality wc generate a phylogenetic tensor on fc + 1 leaves, p(*'+^), by branching 
P^*^^ at the leaf k. This is expressed algebraically by 

p(fe) ^ p(fe+i) = (1 (g) 1 . . . 1 (g) ^) . pW. 
For an arbitrary subset Ac [k], we have 

1 (g) 1 (g) . . . (g) 1 O ^ • 71^ = 7^A' 
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where 



Pushing right through we have 



A' = 



Au{fc + l} , if fc e A, 
A , otherwise. 



PC^+I) = exp 



A.\A\ = 1 



■ exp 



yl.|A|=2 



• exp 



A.|A|=n-l 



Now consider, for a given 1 < i < fc — 1, the term 



exp 



A,\A\=i 



As we are deahng with a tree T, it is only possible that one (and only one) of the subsets (-Bi+i, 
say) in the summation has cardinality i + 1. Additionally, B is disjoint from all of the other 
subsets, so we may write 



exp 



y^ taT^-A' 

A,\A\=t,A^$ 



exp 



y^ taTIa 

A,\A\=i,A=/^$,A^B 



exp [tbTIb^+i] ■ 



Exactly the same argument is valid for the i + 1 term: 



exp 



y^ taTIa' 

A.,\A\=i+l,A=jL(tl 



exp 



y^ taTIa 

A,\A\ = i+l,A^lll.A^B,+2 



exp [tbTZb+i] ■ 



Thus we can express the product of these two terms as 



exp 



y^ ta'R-A' 



A,\A\=i 



exp 



y^ ta'R'A' 



AAA\=i+l 



exp 



E 



exp 



Ts + TIb,+i 



E 

A,|A|=i+l,A#0,A5^B,+2 



■ exp [tbTZb+2] 



Continuing in this way we can place pC^+i) in the required form and the theorem follows by 
induction on k. □ 

Recall that, in the basis {\i1i2 ■ ■ ■ in)} , a phylogenetic tensor P can be expressed as 

E Ptl^2■■■^,^\^li2■■■in) , 

11,12, 

where Pi-^i2...i„ is the probability of observing the pattern iii2 ... in at the leaf vertices. 

From a biological perspective, it is apparent that the form given in Theorem 14. II that utilizes 
a cardinality ordering is somewhat mysterious. However a little thought using commutivity (or 
otherwise) of the various operators shows that it is not so much the cardinality that matters, 
but it is that the operators that arise independently across branches of the tree are necessarily 
commutative and, conversely, those that do not commmute necessarily have non-zero intersection 
and hence are not independent. Noting that there is some freedom in the final expression, we see 
that the cardinality ordering is simply a nice way of unifying the description for arbitrary trees. 
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It is clear that if we take P as in Theorem 14. II in the case that the spht system S is compatible, 
we have a probability distribution on a tree identical to the standard presentation usually given 
in phylogenetics. However, the construction we have given naturally generalizes to a model on 
the most general split systems with trees occuring as sub-models (set the weights for incompatible 
splits equal to zero). There is also an obvious generalization to the case where each split has 
a unique rate matrix - simply give additional split labels to the rate parameters: a — > ua and 

In the next section we explore the a = f3 case in detail. 



5 (In)consistency with the Hadamard conjugation 

In this section we consider the "binary-symmetric" case where a — f3 — ^, so that we can write 



where K 



1 

1 



is the permutation matrix taking |0) 



For any tree T represented as a split system, it is shown in iBashford et ali (|2004() that we can 
write 



P ~ exp 



.xeT 



where {wx\x^T is any set of edge weights on T. We will refer to this constructing of a phylogenetic 
tensor as the "i^T-representation" . 

On the other hand, we have shown in Theorem 14.11 that for a — [} = \ we can write 



with 



and 



P = exp [A-i] • exp [^"2] • . . . • exp {X^\ , 



A.\A\=i 



We will refer to this construction of a phylogenetic tensor as the "^-representation" . We will show 
that for a tree these two representations are exactly equal, but for arbitrary split systems this is 
not the case. 

We will find it convenient to label the vector |iii2 ■■•*«) by the subset A C \n\ defined by 
setting J £ A if and only if ij = 1. For example if n = 6, we have 

|0) = 1000000) , 
|[6]) = |{1,2,3,4,5,6})= 1111111), 
|{2,3,4}) = 1011100), 
|{5}) 1000010). 

Lemma 5.1. // 

(a.) A C B, it follows that TZa \B) = \B - A) - \B) , 
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1 \B) 



(h.) Ar\B^%, it follows that TZa \B) = \B U A) - \B). 
If either case (a.) or (h.), we have 

TZa \B) = (i^(^) - 1 ® 1 (g) . . . 

Proof. For all A C B we have 

i^(^) \B) = \B-A). 

On the other hand we know that 

^hI0) = IW)-|0), 

^Hl[n]) = |0)-|[n]). 
For A C B C [n] it is always possible to permute tensor factors to write 

|i3) = IM)®|S'), 
where riA = \A\ and B' C [n] - A with B' := B - A. Thus 

nA\B) = {n[,,,]\[nA]))^\B') 

= (10) -IK]))® IB') 

= \B-A)- \{B-A)LiA) 
= \B-A)-\B), 

which proves the lemma for A C B. The A n i? = case follows from a similar argument. 



□ 



For two subsets A,B taken from a compatible split system with \A\ — \B\ it is the case that 
A n i? = 0, which in turn implies that [TZa, T^b] — 0. Thus we can make the replacement 



exp [Xi] 



exp 



A,\A\=i 



exp [taTZa] 

A,\A\=i 



where by commutivity the product can be ordered in any way we please. 
Thus, in the ^-representation, we see that for a tree we can write 

P = . . . exp [TAaTZAa] cxp [rA^T^-Aa] exp [tAj^TZAi] V) , 

with, due to the fact we are dealing with a tree, either Ai n Ai^i ~ % or A^+i C Ai. Noting that 
|(5"^^7r) = TTo |0) + TTi [[n]) and repeated application of Lemma [5. II then gives: 

Theorem 5.2. For compatible split systems, the "^-representation" and the "K -representation" 
give rise to identical phylogenetic tensors. 



For arbitrary split systems, however, this is not true, as the example in the next section shows. 



6 Phylogenetic networks and "epochs" 

Consider the three taxa phylogenetic tree given in Figure |6] (a). As was proved above, if we 
take the binary symmetric model we get an identical probability distribution if we use either 
the ^-representation Pe = cxp [tiTZi + T2IZ2 + taIZz] ■ exp [ri27?.i2] |<5^7r) or the if-representation 
Pk = e-^ exp [riK^i) -|- Ta-fiT^^) + tsK'^^') + riaif^^^)] \6^tt), with X = ti + T2 + T3 + T12. 

We would like to introduce the additional parameter T23 associated with the "imaginary" split 
1|23 to these probability distributions. We will do this in a way which is consistent with the design 
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Figure 4: A three taxa tree (a.) is modified by the introduction of the addtional split {23} in (b). 



given in Figure|6](b), where the evolutionary history in broken up into three epochs: a. divergence 
of taxa 3 away from 1 and 2, b. convergent evolution of taxa 2 and 3, with independent divergence 
of taxa 3, and c. independent divergence of all taxa. 

To model this situation we must introduce the additional edge to each representation. To make 
the presentation as simple as possible, we set a molecular clock on the model such that T2 = ti 
and r3 — T12 + ti, and we introduce a scaling parameter 6 G [0, 1] to control the length of the 
second epoch as a proportion of the third epoch by setting r23 = ti9 . As all operators in the 
iiT-representation commute, the only choice available in this case is to take 



P'li = e^^exp 



This is exactly consistent with the generalizations given in iBrvant (l2005bl . [2009h . 

For the ^-representation we do not have commutivity of the operators 7?.2, T^a, i?.i2 with the 
new operator 72.23, thus we need to proceed more carefully as there is some choice in how the extra 
edge is introduced. Using the diagram and its three epochs as a guide, we take 

P's, = exp [Ti(l-e') (7^l +712+ 7^3)] • exp [nO (7^l + 1123)] ■ exp [ri2 (Us + TZi2)] \S^tt) . 

For clarity of comparison, we write the if-representation in epoch form: 



= exp 



Tl 



(1-9) (if + if + if 



■ exp 



Tit 



(if 



if (23) 



•exp [ti2 (if(3)+if(i2))l \S\) 



Now consider the state of the probability distribution at the beginning of epoch 2. As we are 
dealing with the binary symmetric model, it is clear that the probability of the any state \ijk) is 
invariant to permutation of the states 0^1. Also, the structure of the tree up to the start of 
epoch 2 implies that the probability of any state of the form \ijk) where i ^ j is of probability 
zero. Thus we can assume at the start of epoch 2 that the distribution is of the form 

P = {1- <z)i (1000) + |111)) + (1001) + |110)) , 

where, because we are considering a continuous-time model, q £ [O, 
Considering the definitions 



and 



7?.23 = 1 ® (i^a + ® 1 + 1 ® + i/3 ® i/3 + i^/3 «) 1 1 (81 , 



if (23) ^ I K (g, K ^ TZ23 + I ® {La Lp + Lp ($, La) 



it follows that transition rates between the four existing states in the two cases are given by the 
two graphs in Figure [51 where all transition rates are equal. The crucial thing to note is that 7^23 
"corrects" patterns that are inconsistent with the split 1|23, whereas if (^3) simply premutes these 
two states. 
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^|a;00)^ |a;00) \xOl) 

1.01) n MO) n n 

(a) V|a;ll)^ (b) |;^ii) |a;io) 

Figure 5: Transition rates for states \xyz) under (a.) 7^.23, and (b.) K'-'^'^^ — 1 ® 1 (8i 1- 

It is now clear the there is quite a marked difference between the £- and if-representations. 
The £ representations introduces a natural notion of the "coming together" of taxa. In fact it is 
easy to see directly from the diagram that in the limit of extension of the edge T23 to infinity that 
the probability distribution will converge to 

^ = ((1 - h) \ + h) (1000) + |111)) + \q\ (1011) + |1GG)) 
= (1 - \q) \ (1000) + |111)) + \q\ (1011) + |100)) , 

which is consistent with the probability distribution that would arise under a tree where taxa 1 
has diverged from 2 and 3, but there has been zero divergence of taxa 2 and 3 themselves. This 
behaviour is of the course the reasoning behind the way we have chosen to draw our diagram 
Figure H] (b) . 

The ^-representation cannot achieve this type of convergence, with its limiting state being 
(1 - q)\ (|000) + |011) + |111) + |100)) + q\ (|001) + |010) + |11G) + |101)) . 



7 Conclusion 

In this paper we have shown how to express the two state continuous-time general Markov model 
on trees in such a way that allows extension to arbitrary split systems and even more general 
network models. By reviewing the lemmas in Section [2] it is clear that the results extend easily 
to the general Markov model with more character states. However, we defer confirmation of this 
observation to future work. 

In Section [5] we showed that our discussion gives rise to phylogenetic models for the general 
Markov model that are identical to previous approaches only on frees, and in Section |6] we give 
a simple example that shows how our approach allows for convergent evolution of previously 
divergent lineages (a structural property that was previously unobtainable). 

Besides its theoretical interest, we expect that the ability to model convergent evolution in 
this way will have a significant application where it is known that particular datasets exhibit non- 
treelike behaviour due to population genetic properties such as incomplete lineage sorting and 
other effects that confound strictly treelike models (as discussed in Section [IJ. We suspect that 
exploration of the relation between the network models that arise in our discussion and simple 
models of population genetics is likely to yield significant additional insight. 

Finally, compa rison of our network m odels to the distribution space generated by the so-called 
"mixture models" (jPagel &: Meadel . l2004l) is also in need of investigation. For example, comparison 



of a mixture of the trees 12 1 3 and 1|23 to our network example given in Section [5] should yield 
significant theoretical insight into the biological meaning and plausibility of these differing model 
classes. Careful scientific thought is required to tease out what biological processes are explicitly 
(or implicitly) being modelled by either of these approaches. 

Of course these exciting possibilities must be tempered by analysis of the identifiability (or 
otherwise) of the models that arise by taking more general networks. Establishing identifiability is 
not only essential from a statistical inference point of view, but, in this case, may lead to natural 
restrictions of the types of network that can be realistically used for phylogenetic inference. 
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